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These tutorials are a simplified 
introduction and are not sufficient on 
their own to achieve system safety. 

“Never tell me the odds!” You are responsible for the safety of 


your system. 
— Han Solo © 2020 Philip Koopman 1 
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= Anti-Patterns for Critical Systems: 
e You havent characterized worst case failures 
e You havent assigned SILs to system hazards 
e Validation plan doesn't match fleet exposure 


m Critical systems require low failure rates 
e SIL = Safety Integrity Level 
— Higher level of integrity needed for higher risk 
e Safety critical: 
Loss of life, injury, environmental damage 
— Special care must be taken to avoid deaths 
e Mission critical: 
Brand tarnish, financial loss, company failure 
— Consider a safety critical approach 


Knight Capital Says Trading Glitch Cost It $440 Million 
7 ‘i ™@ 356 Comments IIs 


By NATHANIEL POPPER 


Runaway Trades Spread Turmoil Across Wall St. 








The Knight Capital Group announced on Thursday that it lost $440 


million when it sold all the stocks it accidentally bought Wednesday 
morning because a computer glitch. https://goo.gl/7dHOjO 
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= Worst case might not be obvious 
e Aircraft - software can cause a crash 
e Thermostats/HVAC - software can freezing plumbing 


— Can - rarely! - also kill small children due to overheating 


m Key thought experiment: 
e What's the worst that can happen if ... 


.. your system intentionally tried to cause harm? 


e This identifies system hazards to mitigate 


m Failure consequence varies, typically: 


Multiple fatalities (e.g., plane crash) 

Single fatality (e.g., single-vehicle car crash) 
Severe injuries 

Minor injuries 

Can consider analogies for mission-critical goals 
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What Is The Worst Case Failure? Mall 







Malfunctioning heater leads to Fort Worth toddler's death 
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2 MALFUNCTIONING HEATER LED TO BABY'S DEATH |... NEWS!:'«: 


| WHAT SHOULD YOU DO TX GOP CONVENTION BATHROOM BATTLE 


WFAA Channel 8 _§https://goo.gl/rFd8qWw 
Takeaway: get a baby monitor with temperature sensor 
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SIL represents: 

e Therisk presented by a system-level hazard 

e The engineering rigor applied to mitigate the risk 

e The permissible residual probability after mitigation 


Example: DO-178 (aviation flight hours) 


DAL A (Catastrophic): 10° hrs/failure = 114077 years 
DAL B (Hazardous): 107 hrs/failure = 1141 years 
DAL C (Major): 10° hrs/failure = 11 years 

DAL D (Minor): 10? hrs/failure = 42 days 


Example: IEC 61508 (industrial controls) 


SIL 4: 10° hrs/dangerous failure = 11408 years 
SIL 3: 10’ hrs/dangerous failure = 1141 years 
SIL 2: 10° hrs/dangerous failure = 114 years 
SIL 1: 10° hrs/dangerous failure = 11 years 
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https://en.wikipedia.org/wiki/ 
Bhopal_disaster 


1984: Bophal Chemical Plant 
Thousands of deaths 
(not software related; 
pre-dates IEC 61508) 


| https://goo.gl/GGHWRn 
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Higher SIL Invokes More SABES Rigor petals 


= Example: a 
IEC 61508 a ea 

e HR = Highly [3a__Failure assertion programming | 83 | OR | ROR KQHRY 
Recommended [3b Safetybagtechniques | 84 | RRR 

e R=Recommended 3c__Diverse programming dT 88 | OR | OR OR QR) 

° NR=Not Sd_Recoveryblock 8 TR TR RR 
Recommended 3e Backwardrecovery | 7 | OR OT OR | OR TR 

(don't do this) 3f__Forwardrecovery 88 | OR OT OR | OR OT OR 

3g _Rertryfault recovery mechanisms | 3.9 | OR | OR | OR HR N 

[3h_Memorising executed cases | 3-10 | ERR NB 

= SIL 1: lowest rrr a ee ee = 
integrity level 5 Artificial intelligence - fault correction | 0.3.12 | bee ONR [RR 
(low risk) 6 Dynamic reconfiguration | 8-13 | RNR | NR 

7a Structured methods including for example, JSD, cat | Ree gk | 





MASCOT, SADT and Yourdon. 

i a A 
ne _ pont 
integrity level 

4 Formal methods including for Breccia CCS, CSP, HOL, 
(unacceptable risk) "ee =| 


Computer-aided specification tools [IEC 61508] el ek 
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_ Fleet Exposure & Probability teeta 


= Bigger fleets have increased exposure 


e 250 Million US vehicles @ 1 hour/day 
= 2.5 * 10° hrs/day exposure 


e lf “unlikely” failures happen every million hours... 


that’s: 2.5 * 10% hrs / 10° hrs per event 
=> 250 events every day 


e This is why 10° to 109 hrs is a typical goal 
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m Hardware components fail at ~10°-10° hrs 


e Need two independently failing components to get to 10? hours! 
— This motivates redundancy for life-critical applications (SIL 3 & SIL 4) 


= For mission-critical systems, consider: 
e Fleet exposure = # units * operational hours/unit 
e Number of acceptable failures 
e Compute failure rate = failures / hours; pick an appropriate SIL © 2020 Philip Koopman 6 


Lime halts scooter service in 
Switzerland after possible software 
glitch throws users off mid-ride 


Ingrid Lunden @ingridlunden / 23 hours ago C] Comment 
https://techcrunch.com/2019/01/12/lime-scooters-switzerland-bumps/ 
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https://www.li.me/second-street/safety- 
update-february-2019 


“Recently we 
detected a bug in 
the firmware of 
our scooter fleet 
that under rare 
circumstances 
could cause 
sudden excessive 
braking during 
use.” 
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Best Practices For Critical Systems ee 


University 


= Characterize worst case failure scenarios 
e Assign SIL based on relevant safety standard 
e Use engineering rigor for software SIL 
e Use redundancy for ultra-low failure rates 
e Consider fleet exposure, not just single unit 





= Pitfalls: 
e Software redundancy is difficult, and diversity is usually impracticable 
e Designer's intuition about “realistic” faults usually optimistic 
— At 10°°/hr, random chance is a close approximation of a malicious adversary 
e Going through the motions not enough for SIL-based process 
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[ASKING AIRCRAFT DESIGNERS 








ABOUT AIRPLANE SAFETY: _ 
NOTHING IS EVER FOOLPROOF 
BUT MODERN AIRLINERS ARE 
INCREDIBLY RESILIENT. FLYING IS 
THE SAFEST WAY’ TO TRAVEL. 














| DON'T TRUST VOTING SOFTWARE. AND DON'T 
LISTEN To ANYONE. WHO TELLS YOU ITS SAFE. 
WHY? 


I DONT QUITE KNOW HOW To PUT THIS, BUT 
OUR ENTIRE FIELD IS BAD AT WHAT WE DO 
AND IF YOU RELY ON US, EVERYONE WILLDIE. 


(ASKING BUILDING ENGINEERS] | ASKING SOFTWARE 
ABOUT ELEVATOR SAFETY: 


MULTIPLE. TRIED-AND-TESTED oe 
FRILSAFE. MECHANISMS, THEY'RE | THATS TERRIFYING. 
NEARLY INCAPABLE. OF FALLING. 


ENGINEERS ABOUT 
ELEVATORS ARE PROTECTED By’ | LCOMPUTERIZED VOTING: 


THEY SAN THEY'VE FIXED IT UT 
SOMETHING CALLED 'BLOCKCHAIN: 
AAAAA!!! 
WHATEVER THEY SOLD 
YOU, DON'T TOUCH IT. 
BURY IT IN THE DESERT. | 






https://xked.com/2030/ 
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